{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 测试 extract_text_from_excel 函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "from autocoder.rag.loaders.excel_loader import extract_text_from_excel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 准备测试用的Excel文件\n",
    "\n",
    "请确保在运行下面的代码之前,你已经准备好了一个测试用的Excel文件。\n",
    "将文件路径替换为你实际的Excel文件路径。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 替换为你的Excel文件路径\n",
    "excel_file_path = '/Users/allwefantasy/data/yum/schema/schema.xlsx'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 测试 extract_text_from_excel 函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracted sheets:\n",
      "\n",
      "Sheet: /Users/allwefantasy/data/yum/schema/schema.xlsx#数据资产\n",
      "Content:\n",
      "\n",
      "\"1、表描述格式：1）是一张什么行为的表；2）每天分区存储什么数据；3）这张表的粒度是什么\n",
      "2、字段中文名格式：1）注明分区字段；2）日期格式，比如YYYMMDD；3）维度字段枚举值表示的含义，按“枚举值-含义”的样式\",\"\",\"\",\"\"\n",
      "\"示例：\",\"\",\"\",\"\"\n",
      "\"表名称\",\"ads_ph_fact_cpos_tc_v2_day\",\"表描述\",\"订单信息表，用于记录每一笔进入厨房制作的订单，已清洗剔除异常订单\"\n",
      "\"字段序号\",\"字段英文名\",\"字段类型\",\"字段中文名\"\n",
      "\"1\",\"part_dt\",\"string\",\"分区标识，YYYYMMDD\"\n",
      "\"2\",\"storecode\",\"string\",\"餐厅编号\"\n",
      "\"3\",\"businessdate\",\"string\",\"业务日期，YYYY-MM-DD\"\n",
      "\"4\",\"ocordernumber\",\"string\",\"订单唯一编号\"\n",
      "\"5\",\"businesstype\",\"string\",\"交易类型，1-堂食，2、5-都是外带，3、4、7、8-都是外送，6-续杯，通常不参与统计\"\n",
      "\"6\",\"orderreceivetime\",\"string\",\"消费者下单时间，YYYY-MM-DD HH:MM:SS\"\n",
      "\"7\",\"firstitemupscreentime\",\"string\",\"订单中产品首次上屏时间\"\n",
      "\"8\",\"firstitemdownscreentime\",\"string\",\"订单中产品首次下屏时间\"\n",
      "\"9\",\"lastitemdownscreentime\",\"string\",\"订单中产品最后一次下屏时间\"\n",
      "\"10\",\"isbakeorder\",\"string\",\"是否包含烤制产品\"\n",
      "\"11\",\"ordercompletedtime\",\"string\",\"订单完成时间，单位：秒\"\n",
      "\"12\",\"ordercompletednum\",\"string\",\"订单完成数量\"\n",
      "\"13\",\"intervaltime\",\"string\",\"产品下屏时间间隔，最后一个产品的下屏时间减去第一个产品的下屏时间\"\n",
      "\"14\",\"receive_time_int\",\"string\",\"接收订单时间的半小时标志，如1730\"\n",
      "\"15\",\"receive_time_type\",\"string\",\"接收订单时间的半小时时间，如17:30\"\n",
      "\"表名称\",\"ads_ph_fact_cpos_item_v2_day\",\"表描述\",\"订单中产品制作信息表，用于记录每一个工作台中制作的产品的制作开始制作结束时间和数量，已清洗剔除异常产品\"\n",
      "\"字段序号\",\"字段英文名\",\"字段类型\",\"字段中文名\"\n",
      "\"1\",\"part_dt\",\"string\",\"分区标识，YYYYMMDD\"\n",
      "\"2\",\"storecode\",\"string\",\"餐厅编号\"\n",
      "\"3\",\"businessdate\",\"string\",\"业务日期，YYYY-MM-DD\"\n",
      "\"4\",\"ocordernumber\",\"string\",\"订单唯一编号\"\n",
      "\"5\",\"businesstype\",\"string\",\"交易类型，1-堂食，2、5-都是外带，3、4、7、8-都是外送，6-续杯，通常不参与统计\"\n",
      "\"6\",\"orderreceivetime\",\"string\",\"消费者下单时间，YYYY-MM-DD HH:MM:SS\"\n",
      "\"7\",\"ticketno\",\"string\",\"厨房内流转票据号\"\n",
      "\"8\",\"workstationtype\",\"string\",\"工作台类型标识，1-烤制台，2-切饼台，3-制饼台，4-烤制台，5-炸制台，6-水吧，7-沙拉\"\n",
      "\"9\",\"workstationname\",\"string\",\"工作站名称\"\n",
      "\"10\",\"productcode\",\"string\",\"产品编号\"\n",
      "\"11\",\"categoryname\",\"string\",\"产品类别\"\n",
      "\"12\",\"productname\",\"string\",\"产品名称\"\n",
      "\"13\",\"productnumber\",\"string\",\"制作产品数量\"\n",
      "\"14\",\"upscreentime\",\"string\",\"产品上屏时间\"\n",
      "\"15\",\"downscreentime\",\"string\",\"产品下屏时间\"\n",
      "\"16\",\"isbakeorder\",\"string\",\"是否烤制产品，Y-是，N-不是\"\n",
      "\"17\",\"ordercompletedtiming\",\"string\",\"订单完成时间，单位：秒\"\n",
      "\"18\",\"receive_time_int\",\"string\",\"接收订单时间的半小时标志，如1730\"\n",
      "\"19\",\"receive_time_type\",\"string\",\"接收订单时间的半小时时间，如17:30\"\n",
      "\"表名称\",\"edw_dim_dp_c_user_for_day\",\"表描述\",\"营运架构表\"\n",
      "\"字段序号\",\"字段英文名\",\"字段类型\",\"字段中文名\"\n",
      "\"1\",\"guid\",\"string\",\"流水号\"\n",
      "\"2\",\"brand_code\",\"string\",\"品牌编号，1-肯德基，2-必胜客\"\n",
      "\"3\",\"store_code\",\"string\",\"餐厅编号\"\n",
      "\"4\",\"store_name\",\"string\",\"餐厅名称\"\n",
      "\"5\",\"rgm_psid\",\"string\",\"餐厅经理psid（员工号）\"\n",
      "\"6\",\"rgm_dept_id\",\"string\",\"餐厅经理部门编号\"\n",
      "\"7\",\"rgm_employee_name\",\"string\",\"餐厅经理姓名\"\n",
      "\"8\",\"am_psid\",\"string\",\"AM PSID（员工号）\"\n",
      "\"9\",\"am_dept_id\",\"string\",\"AM部门节点编号\"\n",
      "\"10\",\"am_employee_name\",\"string\",\"AM姓名\"\n",
      "\"11\",\"dm_psid\",\"string\",\"DM PSID（员工号）\"\n",
      "\"12\",\"dm_dept_id\",\"string\",\"DM部门节点编号\"\n",
      "\"13\",\"dm_employee_name\",\"string\",\"DM姓名\"\n",
      "\"14\",\"om_psid\",\"string\",\"OM PSID（员工号）\"\n",
      "\"15\",\"om_dept_id\",\"string\",\"OM部门节点编号\"\n",
      "\"16\",\"om_employee_name\",\"string\",\"OM姓名\"\n",
      "\"17\",\"mm_psid\",\"string\",\"MM PSID（员工号）\"\n",
      "\"18\",\"mm_dept_id\",\"string\",\"MM部门节点编号\"\n",
      "\"19\",\"mm_employee_name\",\"string\",\"MM姓名\"\n",
      "\"20\",\"has_child\",\"string\",\"子母店标示，N-普通店，C-子店，P-母店\"\n",
      "\"21\",\"market_name\",\"string\",\"市场名称\"\n",
      "        \n",
      "--------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "# 调用函数并获取结果\n",
    "result = extract_text_from_excel(excel_file_path)\n",
    "\n",
    "# 打印结果\n",
    "print(\"Extracted sheets:\")\n",
    "for sheet_name, content in result:\n",
    "    print(f\"\\nSheet: {sheet_name}\")\n",
    "    print(\"Content:\")\n",
    "    print(content)\n",
    "    print(\"-\" * 50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 结果分析\n",
    "\n",
    "上面的代码会打印出Excel文件中每个sheet的名称和内容。你可以检查输出结果,确保:\n",
    "\n",
    "1. 所有的sheet都被正确提取\n",
    "2. 每个sheet的内容是完整和正确的\n",
    "3. 内容的格式符合预期(例如,单元格之间用逗号分隔,行之间有换行)\n",
    "\n",
    "如果发现任何问题,你可能需要回到`extract_text_from_excel`函数的实现,进行调试或修改。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "byzerllm",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
